Stochastic Weight Averaging Revisited

نویسندگان

چکیده

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is simple-yet-effective approach to assist the SGD in finding better optima, terms of generalization. From statistical perspective, weight-averaging contributes variance reduction. Recently, well-established (SWA) method was proposed, which featured application cyclical or high-constant (CHC) learning-rate schedule for generating weight samples weight-averaging. Then, new insight on introduced, stated that average assisted discovering wider optima and resulted We conducted extensive experimental studies concerning SWA, involving 12 modern deep model architectures open-source image, graph, text datasets as benchmarks. disentangled contributions operation CHC showing SWA still contributed reduction, exploring parameter space more widely than SGD, could be under-fitted due lack training budget. then presented an algorithm termed periodic (PSWA) comprised series operations exploit such wide structures explored schedule, we empirically demonstrated PSWA outperformed its remarkably.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Approximation with Averaging Innovation

The aim of the paper is to establish a convergence theorem for multi-dimensional stochastic approximation in a setting with innovations satisfying some averaging properties and to study some applications. The averaging assumptions allow us to unify the framework where the innovations are generated (to solve problems from Numerical Probability) and the one with exogenous innovations (market data...

متن کامل

Weighted averaging and stochastic approximation

We explore the relationship between weighted averaging and stochastic approxima tion algorithms and study their convergence via a sample path analysis We prove that the convergence of a stochastic approximation algorithm is equivalent to the con vergence of the weighted average of the associated noise sequence We also present necessary and su cient noise conditions for convergence of the averag...

متن کامل

Stochastic Pi-calculus Revisited

We develop a version of stochastic Pi-calculus endowed with a structural operational semantics expressed in terms of measure theory. The paper relies on two observations: (i) the structural congruence organizes a measurable space of processes and (ii) a well behaved SOS associates to each process, in a specified rate environment, a behaviour defined by a set of measures over the measurable spac...

متن کامل

Stochastic pyramid revisited

We focus in this paper on a new model for graph decimation which is based on the stochastic pyramid algorithm. In this study, we present some trends towards a better understanding of the new properties arising from the removal of the sequentiality constraint of this so-called technique. 2002 Elsevier Science B.V. All rights reserved.

متن کامل

Stochastic Einstein Locality Revisited

I discuss various formulations of stochastic Einstein locality (SEL), which is a version of the idea of relativistic causality, i.e. the idea that influences propagate at most as fast as light. SEL is similar to Reichenbach’s Principle of the Common Cause (PCC), and Bell’s Local Causality. My main aim is to discuss formulations of SEL for a fixed background spacetime. I previously argued that S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13052935